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To obtain information about the structure and evolution of the nucleocapsid (N) protein of the coronavirus mouse 
hepatitis virus (MHV), we determined the entire nucleotide sequences of the N genes of MHV-A59, MHV-3, MHV-S, 
and MHV-1 from cDNA clones. At the nucleotide level, the N gene sequences of these viral strains, and that of MHV- 
JHM, were more than 92% conserved overall. Even higher nucleotide sequence identity was found in the 3’ untranslated 
regions (3’ UTRs) of the five strains, which may reflect the role of the 3’ UTR in negative-strand RNA synthesis. All five 
N genes were found to encode markedly basic proteins of 454 or 455 residues having at least 94% sequence identity 
in pairwise comparisons. However, amino acid sequence divergences were found to be clustered in two short segments 
of N, putative spacer regions that, together, constituted only 11% of the molecule. Thus, the data suggest that the 
MHV N protein is composed of three highly conserved structural domains connected to each other by regions that 
have much less constraint on their amino acid sequences. The first two conserved domains contain most of the excess 
of basic amino acid residues; by contrast, the carboxy-terminal domain is acidic. Finally, we noted that four of the five 
N genes contain an internal open reading frame that potentially encodes a protein of 207 amino acids having a large 


proportion of basic and hydrophobic residues. 


Coronaviruses are a family of enveloped, single- 
stranded, positive-sense RNA viruses that are impor- 
tant respiratory, neurologic, and enteric pathogens for 
humans and domestic animals (7). Having the largest 
genomic coding capacities among RNA viruses (at 
least 27 kb) as well as a unique strategy of RNA replica- 
tion, coronaviruses represent very unusual and inter- 
esting molecular biological entities (2, 3). To gain in- 
sight into the roles played by the coronavirus nucleo- 
capsid (N) protein during viral infection, we have been 
characterizing this protein in the well-studied coronavi- 
rus mouse hepatitis virus (MHV). 

One approach to understanding protein structure 
and function is to chart evolutionarily permissible 
changes among closely related proteins. To this end, 
we have cloned and sequenced the N genes of four 
strains of MHV: MHV-A59, MHV-3, MHV-S, and MHV- 
1. Although closely related, these viruses have distinct 
histories, most notably, separate times and geographic 
loci of isolation and different mouse strains of origin (4— 
7). As well, the N proteins of these MHV strains exhibit 
considerable electrophoretic mobility variation on so- 
dium dodecyl sulfate—polyacrylamide gel electrophore- 


Nucleotide sequence data from this article have been deposited 
with the EMBL/GenBank Data Libraries under Accession Nos. 
M35253 (MHV-3), M35254 (MHV-1), M35255 (MHV-S), and 
M35256 (MHV-A59). 
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sis (SDS-PAGE) (data not shown; Refs. (8—70)), sug- 
gesting differences in protein size or amino acid com- 
position. 

Multiple cDNA clones, prepared from poly(A)-con- 
taining RNA from infected mouse 17 clone 1 cells, were 
used to determine the nucleotide sequences of the N 
genes of MHV-A59, MHV-1, MHV-3, and MHV-S. 
MHV-A59 was taken as our reference strain because 
its sequence had been previously reported (77, 72) and 
because our heat-stable variant of this strain is the par- 
ent of a number of temperature-sensitive mutants that 
we plan to characterize in future work (L. S. Sturman et 
al., manuscript in preparation). With the exception of 
the final 71 nt of the 3’ untranslated region (3’ UTR, see 
Fig. 1), the entire sequence of this N gene was deter- 
mined in both directions at least once. At all positions 
where differences occurred between our sequence 
and the previously reported sequence (nt 441, 784, 
1317, 1399, and 1481-1483), we verified the differ- 
ence on at least four additional independent cDNA 
clones. 

Similarly, the entire N gene sequences of MHV-1, 
MHV-3, and MHV-S were determined in both direc- 
tions at least once. At all positions where a difference 
occurred with respect to our prototypic MHV-A59 se- 
quence (Fig. 1), this change was verified on at least two 
additional independent cDNA clones. All cDNA clones 
were in agreement at ali positions examined with the 
following exceptions: nt 1317 of MHV-AS59, for which 
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ESTERS eee AE I eb RGAG ARAN OS EIST EASE CACDE TROT DSTI TERI OANEANSAECARET dec TAACEN 
A GG T 
A 
A G 
CCGAGCGTGGACCAAATAATCAAAATAGAGGCAGAAGGAATCAGCCAAAGCAGACTGCAACTACTCAACCCAACTCCGGGAGTGTGGTTCCCCATTACTC 
C CTGG GG T 
T 
C GT A c T 
CTGGTTTTCTGGCATTACCCAGT TCCAAAAGGGAAAGGAGT T TCAGTT TGCAGAAGGACAAGGAGTGCCTATTGCCAATGGAATCCCCGCTTCAGAGCAA 
T G A G A Tc G A 
T G A TG A c G c A 
T G A G A c A c 
AAGGGATATTGGTATAGACACAACCGCCGTTCTTTTAAAACACCTGATGGGCAGCAGAAGCAAT TACTGCCCAGATGGTATTTTTACTATCTTGGCACAG 
A Cc GC CA 
AG c 6c A 
c A c Cc 6c A 
GGCCCCATGCTGGAGCCAGT TATGGAGACAGCAT TGAAGGTGTCT TCTGGGT TGCAAACAGCCAAGCGGACACCAATACCCGCTCTGATAT TGTCGAAAG 
A 
€ AGAG c GAT C A 16 C GCA GC T G ACTGC T 
C AGAG C GAT C C A TG C GCA GC T G& ACTGC T 
T C AGAG Cc GAT C A 1G C GCA G6 CG TG Tere? T 
GGACCCAAGCAGTCATGAGGCTATTCCTACTAGGTTTGCGCCCGGCACGGTATTGCCTCAGGGCTTTTATGT TGAAGGCTCTGGAAGGTCTGCACCTGCT 
TC T T T A 
Tc T AT A 
TC T AT A 
AGCCGATCTGGTTCGCGGTCACAATCCCGTGGGCCAAATAATCGCGCTAGAAGCAGTTCCAACCAGCGCCAGCCTGCCTCTACTGTAAAACCTGATATGG 
c 
T c 
T c 
CCGAAGAAATTGCTGCTCTTGTTTTGGCTAAGCTCGGTAAAGATGCCGGCCAGCCCAAGCAAGT AACGAAGCAAAGTGCCAAAGAAGTCAGGCAGAAAAT 
T A 
A Cc 
T A 
TTTAAACAAGCCTCGCCAAAAGAGGACTCCAAACAAGCAGTGCCCAGTGCAGCAGTGT TT TGGAAAGAGAGGCCCCAATCAGAAT IT TGGAGGCTCTGAA 
T A 
T T 
T Cc 
ATGTTAAAACTTGGAACTAGTGATCCACAGTTCCCCATTCTTGCAGAGT TGGCTCCAACAGT TGGTGCCTTICTICTTTGGATCTAAATTAGAATTGGTCA 
ct c c 
c CC A 
c c 
AAAAGAATTCTGGTGGTGCTGATGAACCCACCAAAGATGTGTATGAGCTGCAATAT TCAGGTGCAGT TAGATTTGATAGTACTCTACCTGGTTTTGAGAC 
c 
c c AT 6G A CAA 
c G 


TATCATGAAAGTGT TGAATGAGAATT TGAATGCCTACCAGAAG-~~-GATGGTGGTGCAGATGTGGTGAGCCCAAAGCCCCAAAGAAAAGGGCGTAGACAG 


A 
G G TCAA C A GAAGGCAA 
TCAA A T T 6 GAAGGCA 


GCTCAGGAAAAGAAAGATGAAGTAGATAATGTAAGCGTTGCAAAGCCCAAAAGCTCTGTGCAGCGAAATGTAAGTAGAGAATTAACCCCAGAGGATAGAA 
CAAA CTCT G T ct 
CAAA CTC G T cc 
GTCTGTTGGCTCAGATCCTTGATGATGGCGTAGTGCCAGATGGGT TAGAAGATGACTCTAATGTGTAAAGAGAATGAATCCTATGTCGGCGCTCGGTGGT 
A 
T T A 
e TC A A 


AACCCCTCGCGAGAAAGTCGGGATAGGACACTCTCTATCAGAATGGATGTCTTGCTGTCATAACAGATAGAGAAGGT TGTGGCAGACCCTGTATCAATTA 


GTTGAAAGAGAT TGCAAAATAGAGAAT GTGT GAGAGAAGT TAGCAAGGTCCTACGTCTAACCATAAGAACGGCGAT AGGCG-CCCCCTGGGAAGAGCTCA 


CRTEAMSST ETAT TG TERs] GLEE REL ADATENN] GAMET TOATCATRECEAAT USN NEMAT CAC 1666 
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T was read on four clones and A on one clone; nt 1279 
of MHV-3, T on five clones and C on one clone; nt 69 
of MHV-S, A on three clones and T on one clone; nt 
293 of MHV-S, C on three clones and T on one clone; 
nt 416 of MHV-S, C on three clones and T on one clone; 
nt 638 of MHV-S, A on four clones and G on one clone; 
and nt 134 of MHV-1, G on two clones and T on one 
clone. Thus, for these seven nucleotides, the bases 
given in Fig. 1 represent consensus sequences. The 
apparent disagreements at these positions most likely 
reflect the error rate either of the MHV RNA-dependent 
RNA polymerase, which generated the original tran- 
scripts, or of reverse transcriptase, which was used in 
the construction of the cDNA clones. 

An alignment of the four determined MHV nucleotide 
sequences, together with the previously reported N 
gene sequence of MHV-JHM (76), is presented in Fig. 
1. All five N genes are more than 92% homologous. In 
pairwise comparisons, the two most similar sequences 
are those of MHV-A59 and MHV-3; the most distant 
are those of MHV-1 and either MHV-A59 or MHV-3 (Ta- 
ble 1). The greatest densities of nucleotide differences 
among the N genes are in two regions corresponding 
to nt 414—486 and nt 1141-1214 of the MHV-A59 se- 
quence. For the most distant strains, 50% of the nucle- 
otide differences are clustered in these segments, 
which, combined, represent less than 8% of either se- 
quence. By contrast, the most conserved portion of the 
N genes occurs in the 3’ untranslated regions (UTRs), 
which diverge by no more than 3 nt over a total span of 
301 nt. This degree of sequence identity, which ex- 
ceeds that of any portion of the N gene coding region, 
may reflect some functional constraint on the 3’ UTR, 
which presumably acts as a recognition site for the viral 
RNA polymerase during negative-strand RNA syn- 
thesis. 

An alignment of the deduced amino acid sequences 
of the N proteins of the five MHV strains is shown in 


TABLE 1 


N Gene NUCLEOTIDE AND AMINO AciD SEQUENCE DIFFERENCES 


Nucleotide differences? 


Amino acid 
MHV strains Coding region 3'UTR differences? 
Abd9 and 3 2 2 | 
AS9 andS 73 0 18 
A59 and 1 102 3 28 
A59 and JHM 101 2 30 
3and$ 71 2 17 
3and 1 102 3 29 
3 and JHM 99 2 29 
Sand 1 65 3 25 
S and JHM 60 2 25 
1 and JHM 53 | 20 


? Gaps in the sequence alignments are counted as differences. 


Fig. 2. All five N genes encode proteins of 454 or 455 
residues, having molecular weights ranging from 49.6 
to 49.7 kDa. Thus, the apparent size differences ob- 
served among them (data not shown; Refs. (8-70)) 
probably reflect differences in amounts of bound SDS 
or residual secondary structure under the conditions of 
SDS-PAGE. Alternatively, the variation in electropho- 
retic mobilities may indicate different types or extents 
of post-translational modification. 

All five N proteins possess at least 94% sequence 
identity in pairwise comparisons (Table 1). All have the 
salient features noted previously for N of MHV-A59 and 
MHV-JHM: a large excess of basic residues over acidic 
residues (calculated pl's of 10.4—10.6): numerous ser- 
ine residues, some of which are potential phosphoryla- 
tion targets (17, 18); and an acidic carboxy terminus, in 
contrast to the rest of the molecule (77, 76). 

As with the nucleotide sequences, the divergences 
among the amino acid sequences are clustered in two 
regions, corresponding to amino acids 140-162 and 


Fic. 1. Nucleotide sequence comparison of the N genes of five strains of MHV. The heat-stable strain of MHV-A59 used in this study was 
obtained from Dr. Lawrence Sturman, Wadsworth Center for Laboratories and Research. MHV-3 was from Dr. Kathryn Holmes, Uniformed 
Services University of the Health Sciences, and, in turn, had been obtained from Dr. Abigail Smith, Yale University. MHV-S and MHV-1 were 
originally from Dr. John Parker, Microbiological Associates. Libraries of cDNA clones were generated from poly(A)-containing infected cell RNA 
by a modification of the procedure of Gubler and Hoffman (73) using the vector pMG5, as described previously (74). DNA sequencing was 
carried out by a variation of the dideoxy chain termination method of Sanger et a/. (75) using modified T7 DNA polymerase (Sequenase, U.S. 
Biochemical). The synthetic oligodeoxynucleotide primers used for sequencing corresponded to nt 77-93, 328-345, 577-597, 827-847, 
1077-1094, and 1307-1323, or were complementary to nt 280-297, 580-597, 730-747, 880-897, 1030-1047, 1180-1197, 1330-1347, 
and 1632-1649 of the MHV-A59 sequence. To obtain sequences of the 5’ and 3’ extremes of genes, cDNA inserts were subcloned into pbGEM 
vectors (Promega)}, and sequencing was primed with oligodeoxynucleotides corresponding to the SP6 or T7 RNA polymerase promoters. The 
MHV-A859, -3, -S, and -1 sequences were determined in this work, except for nt 1596-1666 of MHV-A59, which is taken from Armstrong et a/. 
(71). The MHV-JHM N sequence is from Skinner and Siddell (76). Spaces indicate positions for which the nucleotide is identical to that of MHV- 
Ad9g. Nucleotides are numbered from the first base of the N protein initiation codon; 3’ polyadenylate tails are omitted. Hyphens indicate gaps 
introduced to maximize the alignment of sequences. The N protein initiation and termination codons are double-underlined. The initiation and 
termination codons of the major internal open reading frame are single-underlined. 
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MSFVPGQENA GGRSSSVNRA GNGILKKTTW ADQTERGPNN QNRGRRNOPK QTATTQPNSG 60 


AG G 


L K 


SVVPHYSWFS GITOFQKGKE FQFAEGQGVP 1ANGIPASEQ KGYWYRHNRR SFKTPDGQQK 120 


LLPRWYFYY LGTGPHAGAS YGOSIEGVFW VANSQADTNT RSDIVEROPS SHEAIPTRFA 180 


PGTVLPQGFY VEGSGRSAPA SRSGSRSQSR GPNNRARSSS NOQRQPASTVK PDOMAEEIAAL 240 


VLAKLGKDAG QPKQVTKQSA KEVRQKILNK PROQKRTPNKQ CPVQQCFGKR GPNQNFGGSE 300 


MLKLGTSOPQ FPILAELAPT VGAFFFGSKL ELVKKNSGGA DEPTKDVYEL QYSGAVRFDS 360 


VSVAKPKSSV 419 
419 
4i9 

RGTK QKAL G 420 

RGTK QKAQ 420 


QRNVSRELTP EDRSLLAQIL ODGVVPOGLE DOSNV 454 


454 
454 
456 
455 


Fic. 2. Amino acid sequence comparison of the N proteins of five strains of MHV. The deduced MHV-A59, -3, -S, and -1 N sequences are 
from this work. The deduced MHV-JHM N sequence is taken from Skinner and Siddeil (76). Spaces indicate positions for which the amino acid 
is identical to that of MHV-A59. The hyphen indicates a gap introduced to maximize the alignment of sequences. The two clustered regions of 


amino acid differences are boxed. 


381-405 of the MHV-A59 sequence. For the most di- 
vergent pair of proteins, those of MHV-A59 and MHV- 
JHM, 63% of the amino acid differences are concen- 
trated in these two portions of N, which together make 
up only 11% of the molecule. This distribution of resi- 
due changes, shown graphically in Fig. 3, suggests a 
model for the MHV N protein in which three conserved 
structural domains (basic, basic, and acidic) are teth- 
ered to each other by two regions of variable amino 
acid composition (designated A and B). We suggest 
that A and B have less constraint on their amino acid 
sequences and principally serve as spacers connect- 
ing the three conserved domains. In contradistinction, 
domains |, Il, and Ill appear to tolerate few amino acid 
changes, implying that most changes in these regions 
impair the functioning of the molecule. 


This model is supported by two further observations. 
First, we have characterized a temperature-sensitive N 
protein mutant of MHV-A59 that has a deletion almost 
exactly coincident with spacer B, indicating that, at 
least at the permissive temperature, the presence of 
this region is not absolutely required for N protein func- 
tion (C. A. Koetzner et a/., unpublished results). Sec- 
ond, in an in vitro assay system, domains | and II] were 
found to be dispensible for the binding of N protein to 
RNA, suggesting that the RNA-binding characteristic 
of N resides in domain Il (P. S. Masters, manuscript in 
preparation). Thus, the domains inferred from our 
amino acid sequence comparison may be functionally 
separable as well as structurally distinct. 

It is noteworthy that the nonconserved residues in 
spacers A and B tend to vary among a limited set of 
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Fig. 3. Schematic representation of amino acid differences among 
the N proteins of five strains of MHV and a three-domain model of 
the MHV N protein. The line at the top indicates numbers of amino 
acid residues. In the rectangle for each MHV strain, a vertical line 
represents an amino acid difference with respect to the prototype 
MHV-A59Q. At the bottom is shown a model for the MHV N protein 
with three domains separated by two spacer regions (A and B). 


two or three alternatives (Fig. 2). This might have sug- 
gested that these two regions are required to vary coor- 
dinately: i.e., an ‘‘A59-like”’ spacer A must always pair 
with an A59-like spacer B and a ‘“‘JHM-like’’ spacer A 
must always pair with a JHM-like spacer B. However, 
the N protein of MHV-S clearly rules out this possibility, 
since this N protein has a JHM-like spacer A and an 
A59-like spacer B (Figs. 2 and 3). Thus, the MHV-S N 


MHV-A59 MESSRRPLGL TKPSVDQIIK I|EAEGISOQSR 
MHV-3 

MHV~S ALE ME 

MHV~1 

MHV- JHM AG* R 
MHV-A59 SLQKDKECLL PMESPLQSKR DIGIDTTAVL 
MHV-3 

MHV-S Y oR Q 

MHV-1 H R T Q 

MHV~JHM H QH T 
MHV-A59 ETALKVSSGL QTAKRTPIPA LILSKGTQAV 
MHV-3 E 

MHV-S A ISKEL S ANRP LR LP L 

MHV- 1 A ISTEL S ANRP LRLP L 

MHV- JHM A ISKEL S ANRPRLG LP OL 
MHV-A59 DLVRGHNPVG QIIALEAVPT SASLPLL 
MHV-3 

MHV-S A 

MHV- 1 A 

MHV ~ JHM 


gene is likely to have arisen from a recombination event 
between two ancestral viruses: one having an N gene 
more similar to MHV-A59 and MHV-3 and the other 
having an N gene more similar to MHV-1 and MHV- 
JHM. RNA recombination among murine coronavi- 
ruses has been shown to occur both in tissue culture 
and in the brains of doubly infected animals (79, 20). All 
five N gene sequences compared here, then, appear 
to be accounted for by either drift or recombination plus 
drift from two prototype genes. 

Four of the five MHV genes in Fig. 1 contain a poten- 
tially significant internal open reading frame (ORF) in 
the +1 reading frame relative to the N protein ORF, be- 
ginning at nt 65 and terminating at nt 688. In each 
case, the protein encoded by this ORF is 207 residues 
in length (22.6-22.9 kDa) and is distinguished by a 
large excess of basic residues (calculated pl’s of 10.6- 
11.1) as well as a relatively high (17%) leucine content 
(Fig. 4). The MHV-JHM N gene contains a very similar 
ORF in the same position, but this is interrupted by a 
stop codon following the 16th amino acid residue. For 
all of the N genes, the start codon for the internal ORF 
occurs in a strong context for translation initiation, 
whereas the N protein start codon (nt 1~3) and an inter- 
vening start codon (nt 26-28) both fall in suboptimal 
contexts. Thus, it is possible that the internal ORF may 
be translated by means of a “leaky scanning’ mecha- 
nism (27). Leucine-rich internal ORFs also have been 
noted within the N genes of bovine coronavirus (22) 
and human coronavirus 229E (23). The significance of 
these potential polypeptides awaits determination of 


LOLLNPTPGV WFPITPGFLA LPSSKRERSF 60 


I L OR NR K 
| LR NFR K 
i LOR NR K 


LKHLMGSRSN YCPDGIFTIL AQGPMLEPVM 120 
A Ss 
A $ 
A Ss 


MRLFLLGLRP ARYCLRAFML KALEGLHLLA 180 


VF v Q 
Vv KV Q Vv 
Vv KV Q Vv 


207 


Fic. 4. Amino acid sequence comparison of the major internal open reading frames of the N genes of five strains of MHV. The deduced MHV- 
AbdQ, -3, -S, and -1 sequences are from this work. The corresponding region of MHV-JHM is deduced from Skinner and Siddell (76). Spaces 
indicate positions for which the amino acid is identical to that of MHV-A59. The asterisk in the MHV-JJHM sequence indicates a stop codon. 
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whether any of them are actually synthesized in coro- 
navirus-infected cells. 
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